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Abstract. We consider how to generate chemical reaction networks (CRNs) 
from functional specifications. We propose a two-stage approach that 
combines synthesis by satisfiability modulo theories and Markov chain 
Monte Carlo based optimisation. First, we identify candidate CRNs that 
have the possibility to produce correct computations for a given finite 
set of inputs. We then optimise the reaction rates of each CRN using a 
combination of stochastic search techniques applied to the chemical master 
equation, simultaneously improving the probability of correct behaviour 
and ruling out spurious solutions. In addition, we use techniques from 
continuous time Markov chain theory to study the expected termination 
time for each CRN. We illustrate our approach by identifying CRNs for 
majority decision-making and division computation, which includes the 
identification of both known and unknown networks. 
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1 Introduction 

A central goal of molecular programming is to be able to implement arbitrary 
dynamical behaviours. Chemical reaction networks (CRNs) are a popular for¬ 
malism for describing biochemical systems, such as protein interaction networks, 
gene regulatory networks, synthetic logic circuits and molecular programs built 
from DNA. Extensive theoretical understanding exists about the behaviour of a 
multitude of CRNs, and the behaviour of some networks has been exhaustively 
explored [1]. Besides describing chemical systems, CRNs provide a common 
language for expressing problems studied in computer science theory (e.g. Petri 
nets, network protocols) as well as control theory and engineering. Methods 
exist to convert CRNs into equivalent physical implementations, based on DNA 
strand displacement [2,3] the DNA toolbox system [4] and genelets [5]. There¬ 
fore, we sought to develop a methodology for proposing candidate CRNs that 
exhibit a pre-specified behaviour. 

The computational power of CRNs has been extensively studied [6]. It is 
known that error-free (stably computing [7]) CRNs compute exactly the class of 
semi-linear functions [8,9]. However, if the stability restriction is relaxed and we 



allow the CRN to sometimes compute the wrong answer then it is possible to 
implement a register machine, that is, CRNs with error can compute functions 
beyond the semi-linear class (indeed they are equivalent in power to Turing 
machines) [10,6]. 

Although there are procedures to generate CRNs for semi-linear functions 
[8,10], primitive recursive functions [6], or even from arbitrary Turing ma¬ 
chines [6], the proposal of practical (i.e. experimentally implementable) CRNs 
that compute a given function has thus far mostly been a manual effort. In this 
work, we attempt to automate the proposal of CRNs, by formally specifying a 
behaviour and automatically identifying CRNs that satisfy the desired behaviour 
with high probability. First, we formalise the problem of identifying CRNs that 
have the capacity to produce correct, finite computations for a given finite set 
of inputs. This corresponds to a synthesis problem, as opposed to verification, 
where the goal is to determine the correctness of a given CRN [11]. We express 
CRN synthesis as a satisfiability modulo theories (SMT) problem, which can be 
addressed using solvers such as Z3 [12]. This allows us to generate a number 
of canditate CRNs or to prove that no such CRN of a given size (in terms of 
numbers of reactions, species and computation lengths) exists. However, while 
the existence of correct computations is guaranteed for each generated CRN, the 
probability of these computations might be low. 

To determine whether correct computations can occur with high probability, 
we next optimise the reaction rates of each generated CRN. To solve the opti¬ 
misation problem, we combine stochastic search strategies based on Markov 
chain Monte Carlo (MCMC) with numerical integration of the chemical master 
equation (CME). This part of the problem was recently addressed in [13,14], 
though applied only to a single input. 

In this paper, we specifically focus on uniform CRNs, those that have a fixed 
number of species and reactions for all input sizes. We also restrict our attention 
to bimolecular CRNs, where there are precisely 2 reactants and 2 products in ev¬ 
ery reaction. Bimolecular CRNs are equivalent to Population Protocols (PPs) [7] 
and also guarantee that mass is conserved in the system. We applied our two- 
step approach first to majority decision-making, in which the network seeks to 
identify which of two inputs is in an initial majority. Majority networks are well- 
studied in the literature, and there are many known CRNs that give approximate 
solutions [15,16,17]. We then applied our approach to division, a non-linear 
function which has been relatively less studied. We show a range of CRNs for 
majority and division identified automatically using our method, some of which 
have been identified and characterised previously, though some of which are 
entirely novel. This illustrates the potential for automatically determining CRNs 
with a specified behaviour. 

2 Preliminaries 

A chemical reaction network (CRN) is a tuple C = (A, 1Z), where A = {so, 
and 1Z = { J'o,..., r m } denote the finite sets of species and reactions, respectively. 
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A reaction is a tuple r — (i r , p', k r ) where r' and p' are the reactant and product 
stoichiometry vectors (r£ G No and p£ G No denote the stoichiometry of each 
species s G A), k r G 1R>q denotes the rate of r and k denotes the vector of 
all reaction rates. Given a reaction r — (r',p r ,k r ), the set of reactants of r is 
{s G A | Tg > 0} and the set of products of r is {s G A ! Ps > 0}. In this paper, 
we focus on the class of bimolecular CRNs, where r s — 2 and X] s ga Ps — 2, 
for all reactions r G 7J. 

The dynamical behaviour of bimolecular CRNs can be understood as follows. 

The set of all possible system states is X = IN (| A , where a state x G N () A 
represents the number of molecules of each species. We denote the number of 
molecules of species s G A at state x by x s . Given a reaction r G 1Z where r r s — 2 
for some s G A, the propensity 1 of r at x is k r x — k r ■ Xs '(^ s ~ 1 ' _ jf / on the other 
hand, r r s = rh = 1 for some species s, s', the propensity of r is k' x — k r ■ x s ■ x s i. 
The time at which reaction r would fire, once the system enters state x G X, is 
stochastic and follows an exponential distribution with a rate determined by the 
reaction's propensity k r x . Assuming that reaction r is the first one to fire, the state 
of the system is updated as x' s — x s — r' s + pg for all s G A, where x and x' are 
the current and next states. 

An abstraction of CRNs that preserves reachability but does not consider 
reaction rates or time is given by the transition system T c — (X, T), where the 
transition relation T is defined as 

Mx,x' G X . T(x,x / ) -H- \/ /\ (x s > r' A x' s — x s — r r s + pg) • (1) 

naTlseA 

In other words, the choice between reactions from 1Z is non-deterministic but 
enough molecules of each reactant must be present in state x for the reaction to 
fire. The transition between states x and x' happens when any reaction r G JZ 
fires and the number of molecules is updated accordingly. A path x$, x \,... of T 
satisfies T (x„ Xi + \ ) for i — 0, 1, ... and, given an initial state Xq we call state x f 
reachable from Xq if there exists a path Xq, ..., Xf. 

Given a CRN C, let Xo C X denote a finite set of initial states and X r C X 
denote the set of states reachable from Xq. Assuming that X r is finite, C can be 
represented as a continuous time Markov chain (CTMC) that preserves informa¬ 
tion about the transition probabilities and rates that determine the stochastic 
behaviour of the system and the expected execution times. We define a CTMC 
to be a tuple A4 — (X r , 7 Zq, Q), where X,- is a finite set of states, tiq : X r -A- ]R 
is the initial distribution of molecule copy numbers of all species, and Q : 
X r x X r —»IR is a matrix of transition propensities. While the set of initial states 
is not represented explicitly, it is captured through the initial distribution, i.e. 
Xo — {x e X r \ ttq(x) > 0}. A CTMC M c is constructed from a CRN C by first 
determining the set of reachable states, and then evaluating the propensities of 
each reaction. The (i,j) th entry of Q, q v j, represents a transition from state x, to 

1 We assume that the reaction volume is 1 to allow for later volume scaling e.g. k r x /v is 
the propensity for a reaction volume equal to v 
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state Xj. Accordingly, qp is the remaining probability mass, equal to — qij- 

The transient probability vector Ttt evolves according to — nt Q, which is 
known as the chemical master equation (CME). 

Following [13,14], a parametric CTMC (pCTMC) is a CTMC where the re¬ 
action rates are parameterised by k, as above. Denote by V the parameter 
space, V : R> 0 , such that k is instantiated by a parameter point p A V. Ac¬ 
cordingly, given a pCTMC M. and parameter space V, an instantiated pCTMC 
M.p — (X, 7To, Qp) is an evaluation at point p € V. 

3 Problem formulation 

The main problem we consider in this paper, which we formalise in this section, 
is the identification of CRNs that satisfy given properties. Specifically, we are 
interested in finite reachability properties, which capture a range of interesting 
CRN behaviours. 

Let C = (A, 7L) be a given CRN and T c = (X, T) and M c = (X r , n 0 , Q) 
denote its transition system abstraction and CTMC representation, as discussed 
in Section 2. Let cp : X —» B denote a state predicate, constructed using 

(p ::= E b 

E b true \ false \ E c \ ^E b \ E b \> E b where > € {A, V, =>, <=>} 

E c ::= E n > E a where D> € {<,<,=,>,>} 

E a ::= s G A | c € Z | £„ > E„ where > € {+,—,*}. 

For example, if cp s >5, then (p(x) denotes that x s > 5. 

In this paper, we consider path predicates O = (ipo, cpp), which are expressed 
using two state predicates that must be satisfied at the initial ((po) and at some 
final (cpp ) state of a path. Let K denote the number of steps we consider. 

Definition 1. Given a finite path p : xq. . .x^ of T c we say that p satisfies path 
predicate <J> = (^O/^f)/ denoted as p \= <J>, if and only if(po(xo) A <Pf(xk) evaluates to 
true and no reactions are enabled in xk (i-e. xk is a terminal state) 2 . 

We define the probability of <J>, denoted P<j>, using as follows. Let Xo = 

{x G X | (po(x)} denote the set of states that satisfy the initial state predicate. 
We initialise with a uniform sample from the states that satisfy (po, which 
defines tiq as 

7T 0 (x) = { A ^ ^ X ° 

0 otherwise 


2 We consider terminating computations by enforcing that no reactions are enabled at 
the state that satisfies (pp. Alternative strategies possible within our approach could 
consider reaching a fix-point (i.e. the firing of any enabled reaction does not cause 
a transition to a different state), or reaching a cycle along which (p F is satisfied, to 
guarantee that the correct output is eventually computed and remains unchanged by 
any subsequent reactions. 
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Similarly, Xf = {x G X | cpp(x)} denotes the set of states satisfying the final 
state predicate. 

Definition 2. The probability o/<f> is defined as 

p® = nt{x), 

x€X F 


where t denotes the maximal time we consider and nt is the probability vector at time t 
computed using the CME introduced in Section 2. In other zvords, zve define P& as the 
average probability of the states satisfying cfp at time t. 

Note that it is possible to optimise for both speed and accuracy by, for example, 
defining P<j> to be the integration of the probability mass of all states satisfying 
([){- from time 0 to time t. 


Problem 1. Given a finite set of path predicates {by,... ,<!>„}, find a bimolecular 
CRN C such that 


1 . 

2 . 


for each <f>„ there exists a path p, of T c , such that p, 1= <f>, and 


the average probability 


EjU % 

n +1 


defined using A4 c is maximised. 


4 Synthesis and tuning of CRNs 

We solve Problem 1 by addressing each of the two subproblems separately. First, 
we generate a number of CRNs that satisfy the specifications from Problem 1.1 
using a satisfiability modulo theories (SMT)-based approach (Section 4.1). The 
CRNs identified at that point are capable of producing a path that satisfy each 
path predicate, which addresses Problem 1.1 but they might also include incor¬ 
rect paths and the probability of correct computations might be low. Therefore, 
we tune the reaction rates of these CRNs in order to maximise the average 
probability (discussed in Section 4.2), which addresses Problem 1.2. 


4.1 SMT-based synthesis 

Here, we present our approach to finding a bimolecular CRN C that satisfies 
a specification expressed as path predicates {Ty, • ■ ■, 4> n } (Problem 1.1). We 
address this problem by encoding T C symbolically for any possible bimolecular 
CRN C = (A,1Z) where \1Z\ — M and |A| — N (i.e. the number of species 
and reactions is given), together with the specification {<J>o,..., <h„} for some 
finite number of steps K, as a satisfiability modulo theories (SMT) problem. We 
then use the SMT solver Z3 [12] to enumerate bimolecular CRNs that satisfy the 
specification or prove that no such CRNs exists for the given N, M, and K. Finally, 
we apply an incremental procedure to search for CRNs of increasing complexity 
(larger N and M) or to provide more complete results by increasing K. 

Using Z3's theory of linear integer arithmetic, we represent the stoichiometry 
of C as two symbolic matrices r e N^ xN and p e N^ xN (using integer 
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constraints to prohibit negative integers). Given a reaction r G 1Z and species 
s € A, rj (pj.) defined in Section 2 is now encoded as a symbolic integer. We 
ensure that only bimolecular CRNs are considered by asserting the constraints 
A/^o 1 EjI'o 1 r A = ^ ar *d A/^o 1 ^M) 1 Pi,; — 2. In addition, we introduce the 
following constraints. 

- We label a subset of the species Aj C Aas inputs and assert that Asga, Vre TZ r s > 

0 to ensure all inputs are consumed by at least one reaction. 

- We label a subset of the species Aq C A as outputs and assert that A s ga 0 Vreft Ps > 
0 to ensure all outputs are produced by at least one reaction. 

- We assert that t\ r ye_TZ,r^r’ VsgaPs A Ps VrJ ^ r£ to ensure that two re¬ 
actions never have the same reactants and products and, therefore, all M 
reactions are utilised. 

- Finally, we assert that Are n V s eA Ps A r s to ensure that the firing of each 
reaction updates the state of the system. 

Following an approach inspired by bounded model checking (BMC) [18], 
we represent the finite path p, — x l 0 ,...x l K for each O, by defining each state 
as a symbolic vector x l - G IN ^ and "unrolling" the transition relation of Tq 
(i.e. asserting the constraint T(xt, xj +1 ) for each i — 0... n and j = 0... K — 1). 

For each path predicate by = (<po,(j>F) and path p, we then assert the con¬ 
straint (po(x [,) A c Pf(x ‘ k ) A Terminal(x l K ) according to Def. 1, where Terminal(x) = 
f\reii Vsga x s < r s' i- e - no reactions are possible due to insufficient molecules of 
at least one reactant. 

The parameter K specifies the maximal trajectory length that is considered. 

The BMC approach is conservative, since computations that require more than K 
steps (reaction firings) to reach a state satisfying <pp will not be identified. In¬ 
creasing K leads to a more complete search, and indeed the approach becomes 
complete for a sufficiently large K determined by the diameter of a system, but 
also increases the computational burden. To alleviate this, we follow an approach 
from [11] and consider stutter transitions (corresponding to multiple firings of 
the same reaction in a single step) by using the following modified transition 
relation definition T s t (as opposed to T from Eqn. 1) 

Vx,x' G X . T s t(x,x') GA (Terminal(x) A x — x') V 

3« G N . \/ f\ (x s > t r s A x s > n ■ (r r s - p r s ) A x' = x s + n ■ (p r s - r r s )) . 
re7ZseA 

For any enabled reaction r (x s > A), Tst allows r to fire up to n times in the stutter 
transition, n is limited by the consumption and production of the species needed 
for the reaction to fire (x s > /? ■ (r( — p()). In many cases, stutter transitions 
dramatically decreases the required trajectory lengths (K), since multiple copies 
of the same species can react simultaneously. However, this is not restrictive, 
since for n — 1 the original definition of T is recovered. In addition to such stutter 
transitions, T s t allows self loops at terminal states, and therefore computations 
that require less than K steps to reach a state satisfying <pp can also be identified. 
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The encoding strategy described so far allows us to represent CRN synthesis 
as an SMT-problem and apply an SMT solver such as Z3 [12] to produce a CRN 
that satisfies the specification or prove that no such CRN exists for the choice 
of M, N and K. More specifically, a solution CRN C is represented through the 
valuation of r and p, which are extracted from the model returned by Z3. 

In general, we are interested in enumerating many (or all possible) CRNs 
for the given class (defined by M, N and K), which ensures that no valid solu¬ 
tions are omitted at that stage. To do so, we apply an incremental SMT-based 
procedure, where at each step we assert an uniqueness constraint guaranteeing 
that no previously discovered CRNs are generated. Given a concrete, previously 
generated CRN C = (A, 7V) and the new symbolic CRN C — (A, 72.) we are 
searching for (both of which are defined using the same species A), we define 
the constraint DifferentFrom(C') = ->/\re 7 1 Vr’eTZ' r — r ’ > where r — r’ if and 
only if r£ = t r s A p( = p[ for all s G A. The new CRN C cannot simply be a 
permutation of the same reactions 3 . We start by generating a solution C (if one 
exists), asserting the constraint DifferentFrom(C'), and repeating this procedure 
until the constraints become unsatisfiable, which corresponds to a proof that not 
additional CRNs exists for the given N, M, and K. 

4.2 Tuning CRNs with parameter optimisation 

Here, we present our approach to optimising the reaction rates for CRNs satisfy¬ 
ing {<J>o, This becomes a parameter synthesis problem over a pCTMC 

set, analogous to parameter synthesis for a single pCTMC, as studied in [13,14]. 
In contrast to this work, we aggregate over the multiple input combinations, as 
specified in Problem 1.2. 

To obtain solutions for the probability at a specified time 7Tf, we used nu¬ 
merical integration of the CME. Specifically, we used the Visual GEC software 
(http: //research.microsoft. com/gec) to encode the CRNs and then integrate 
the CME for each combination of inputs. 

To solve the maximisation problem, we used a Markov chain Monte Carlo 
(MCMC) method, as implemented in the Filzbach software (http: //research, 
microsoft. com/f ilzbach). Filzbach uses a variation of the Metropolis-Hastings 
(MH) algorithm to perform Bayesian parameter inference. The MH algorithm 
is used to approximate the posterior probability of a parameter set from a hy¬ 
pothesised model taking on certain values, constrained by a likelihood function. 
The probability of each parameter value is then approximated by constructing a 
Markov chain of sampled parameter sets, such that a proposed parameter set is 
accepted with some probability, based on the ratio of the likelihood function eval¬ 
uated at current and proposal parameter sets. For more information on MCMC 
methods, see [19]. MCMC methods, such as simulated annealing, have also been 
shown to efficiently find solutions to combinatorial optimisation problems [20], 
taking a stochastic search approach similar to the MH algorithm. Stochastic 

3 At present, our uniqueness constraint does not consider other CRN isomorphisms but 
certain species symmetries are broken by the specification <E>, 
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search can provide benefits over gradient-based optimisers by maintaining a 
nonzero probability of making up-hill moves, protecting against getting stuck in 
poor local optima. To use Filzbach for providing solutions to optimisation CRN 
parameters, it is sufficient to encode the argument of Problem 1.2 as a likelihood 
function. Subsequently, we generate MCMC chains with suitably many burn-in 
iterations and samples to obtain an approximate optimising parameter set k. 


4.3 Calculating expected time 

To evaluate the temporal performance of a CRN algorithm C, we make use of 
Markov chain theory to obtain the expected time until a terminal state is reached. 
This is an exact measure of the expected running time for a given pCTMC with 
inputs i £ X, as opposed to using the mean of many stochastic simulations [10]. 

Let A C X r be the absorbing states of a pCTMC — (X, ttq, Q p ) and let 
t a be a vector of expected hitting times, corresponding to the expected time of 
transitioning from a state x £ X r to A. Then t a can be evaluated as the solution 
to the equations (page 113 of [21]) 


t a = 0 for x £ A 
- q x ,x' T %> — 1 f°r x i A. 

x'ex r 


Numerical solutions can be obtained by forming a matrix W where the rows 
and columns of Qp corresponding to the terminal states (A) have been removed. 
Then, r A is the solution to IVt' = 1, where 1 is the vector of l's. Numerical 
solutions can be obtained using Gaussian elimination. 

Note that the time complexity analysis of CRNs typically assumes a volume 
n equal to the maximum number of molecules in the system at any time [8] 
(equivalent to parallel time in PPs [10]). This volume can be included by dividing 
each propensity by n before calculating expected time (see Section 2). In the case 
of bimolecular CRNs this is equivalent to multiplying t a by n. 


5 Case studies 

5.1 Approximate majority 

Approximate Majority is one of the most analysed functions in distributed com¬ 
puting. It is the approximate version of the majority problem, which cannot be 
exactly computed by bimolecular CRNs (or population protocols) with less than 
4 species [22], For CRNs with 2 and 3 species there are known optimal (in terms 
of reaction firings) approximate algorithms [15,16]. 
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We specify the majority problem using the path predicate (see Section 2): 
®AM(a,b) := (<£o( fl / b),(p F (a, b)), where 


<po (a,b) 


<pF(a,b) 


where A™ b 


= a A B — b A X — 0 

(K] b *«>* 

: = if "<^ 

l v otherwise 

A — a + b A B = 0 and 


B™ b : =A=0AB = a+b 


if N — 2, 

if At = 3 


We used inputs a, b £ [1...5] 2 U [6 ... 10] 2 for both optimisation and synthe¬ 
sis. We applied the SMT approach to identify all CRNs with 2 to 4 reactions and 
2 or 3 species that satisfy for K <5 stutter steps (for N species and M reac¬ 
tions, there are ( N ^ l-1 ) total possible CRNs). We used a short optimisation 
(20 burn-in, 20 samples) and sorted these solutions by the value of P& AM for each. 
We then applied a longer optimisation (700 burn-in, 700 samples) to the top 10 
CRNs (Fig. 1). 



Fig. 1. Performance of approximate majority circuits. The SMT-based method was ap¬ 
plied to the approximate majority specification for CRNs with 2, 3 and 4 reactions. For 
each category, the top 10 CRNs satisfying are ordered by their average probabil¬ 
ity after a short optimisation (20 burn-in, 20 samples; red bars). A longer optimisation 
(700 burn-in, 700 samples; green bars) was also performed. We also show the average 
probabilities before optimisation (all rates equal to 1.0; blue bars). The dashed line is 
the average probability of CRN AM 3 4 #448 after the longer optimisation, 0.8999, the 
maximum average probability in this trail. 


Using our approach, we found 1 CRN with 2 reactions and 2 species, the 
known direct competition (DC) network [23] (Fig. 2a). Out of 59,640 possible CRNs 
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a AM 2 , 2 #0 

A + B^tB + B 

a + b^4a + a 

b AM 3/3 #39 

A + B^X + I 
A + X^A + A 
B + X^B + B 

c AM 3/ 3 #28 

B + X^B + B 
X + X-^ A + A 
a + b^4x+x 

d AM 3/4 #162 

17 9 

B + X^B + B 

a+b^4x+x 
A + B A + X 
A + X^4 A + A 

e AM 3/4 #174 

b + x^4 b + b 

A + B 114 B + X 

a + b^4a + x 

A + X^l+A + A 


Before optimisation (rate 1.0) After optimisation 



Input A 





10 4 



10 3 


In) 



* ? x 58 


10° 10’ 10 2 10 3 
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Fig. 2. Response of Approximate Majority algorithms to varied inputs. For each input 
combination, specified as initial copies of species A and species B, the probability that 
both have the correct molecule count after 100 time units is reported. Results are shown 
for a variety of networks that performed well following optimisation (see Fig. 1). The 
performance of each CRN is compared both before optimisation (all rates equal to 1.0; 
left panels) and after long optimisation (central panels). The grey boxes show the input 
ranges used for both generation and optimisation. The expected time until the CTMC 
reaches a terminal state is calculated for varying total molecule counts (n) (right panels). 
These times consider rates scaled as if occurring in a volume n (see Section 4.3). The 
completion times for three alternative initial configurations (initial copies of A were 10%, 
60% and 90% of n respectively) were calculated, illustrating minor differences in circuit 
completion times (x marks systems using optimised rates and • marks systems using 1.0 
for all rates). 
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with 3 species and 3 reactions, the SMT solver found 39 CRNs where &am was 
satisfied, 2 of which with probability over 0.696 after the short optimisation (see 
Fig. 1). These two networks (AM 3 3 #24 and AM 3 3 #28) are the dual of each 
other and behave asymmetrically but perform well owing to a compensatory 
asymmetric parameterisation (Fig. 2c). One might expect that we should discover 
the known approximate majority circuit [15,17], (see Fig. 2b). However, this CRN 
does not satisfy the specification &am since, for input (A — 1, B = 1, X = 0) the 
network terminates in the state (A = 0, B = 0, X = 2) and thus fails to make a 
decision. If we remove this single problematic input from the specification <&AMr 
then this CRN is indeed discovered. We include it for comparison as AM 3 3 #39. 
Note that it scores a 0 on inputs A — 1, B — 1. 

By increasing the number of reactions to 4, the SMT solver found 515 satisfy¬ 
ing networks out of the 1,028,790 possible ones. The top 5 networks, AM^fi #448, 
#328, #445, #333, and #257 have the same rules as the 3 reaction network AM 3 3 
#39 but each has a different 4th reaction. The network AM 3 4 #162 had a lower 
performance than AM 3 3 #39 before optimisation and was almost as good fol¬ 
lowing optimisation. This network was also asymmetric, with a corresponding 
asymmetric parameterisation after optimisation (Fig. 2d). The known 4 reaction 
network AM 3 4 #174 [17] (Fig. 2e) is also identified in 10th position. 

Finally, we analysed the expected time until termination for each circuit, 
using the procedure in Section 4.3 (right-hand panels of Fig. 2). Note that Defini¬ 
tion 2 does not reward circuits that reach a high probability before the final time 
tf — 100. However, in nearly all cases, the estimated hitting time of each system 
was improved by optimisation. 


Computation times The computation times of our procedure depend on the 
size of the circuit (M and N), length of considered computations ( K ) and exact 
specification (including the number of given path predicates). We illustrate the 
computation times required for the SMT-based synthesis part of our approach 
with the majority decision-making CRNs (Fig. 3). 

To determine how the CME calculation used in our method scales with 
molecular copy numbers, we first ran calculations of the CME for the established 
3-reaction approximate majority CRN (system A M 3 3 #39). The calculation was 
initialised with 0.6/? copies of A and 0.4/? copies of B, and all rates were set to 
1. As increasing the copy number decreases the simulation time interval over 
which there are transient dynamics, we integrated the CME over the time interval 


0, ™ J, where n is the total copy number. We calculated transient probabilities 

at 500 output points, with n € [10,1000]. This led to state-spaces of varying 
size, up to 10 6 , with all calculations completing within 7200 seconds (2 hours). 
Smaller examples took only a few seconds. 

We can approximate the total run-time for parameter tuning as a function 
of the number of iterations of the MCMC algorithm and the number of input 
combinations assessed. For example, doing 200 iterations over 10 input combi¬ 
nations which all have below 30 total molecules (<1 s each) suggests a tuning 
procedure of no more than 2,000 seconds. 
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Fig. 3. Computation times for the SMT-based synthesis of majority decision-making 
CRNs. Panel (a) shows the time required to generate a number of solutions (candidate 
CRNs) for Oam ^ or N species and M reactions (denoted AM^,m) for N,M E {3,4} 2 . The 
computation was halted after 2 hours. Panel (b) shows the number of solutions found as 
K (the length of considered computations with stutter transitions) increases. 


5.2 Division 


Division is a non-semi-linear function and therefore it cannot be stably computed 
by CRNs [ 8 ]. However, CRNs have been proposed that might implement the 
calculation of a ratio [24], which allows plants to ration starch reserves during 
seasonally changing nights. 

We specify the division problem using the path predicate (see Section 2): 


®Div(a,b) := (<po(fl, 6 ), #(«,&), where 


cp 0 {a,b) 


A = a AB — b AX — Q if N = 3 

A = aAB — bAX — OAY — 0 if N = 4 


$F(a,b) X — 


a 

.b 


\ 


We chose the input ranges a, b G [1,..., 10 ] 2 for synthesis and optimisation 
to give diverse selection of responses and to reinforce that [^\ — 0 when a < b. 
We applied the SMT approach to CRNs that satisfied <t>Div with K < 20 (without 
stutter transitions). For 3 species and 3 reactions, 22 CRNs were discovered. For 
4 species and 3 reactions, 34 CRNs were discovered. For 4 species and 4 reactions 
the first 105 CRNs were discovered. Of these, only one CRN DIV 4 3 # 29 exceeded 
an average probability of 0.5, though in most cases, optimisation improved 
performance substantially (Fig. 5). For many of the generated circuits, high 
performance was observed only for b > a, which should always evaluate to 0 , 
with poor performance for the nonzero output cases of a > b (Fig. 6 a,b). Note that 
Div 4 3 #29 is so far the top scoring divider CRN in this class. Clearly, none of these 
circuits can be considered as good algorithms for computing division, though 
our procedure was able to detect some very simple yet mediocre circuits in an 
automated way. It is possible that better circuits will be found by considering 
CRNs with more reactions, species, and longer computation paths. 
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Fig. 4. Transient probability calculation times for CRN AM 33 #39. Times indicated 
include the enumeration of the state-space, construction of a sparse matrix, then numerical 
integration in the interval [0, ™], where n is the total molecule count. A single calculation 
was conducted for each value of n. 


6 Discussion 

In this paper, we presented a computational approach for the synthesis and 
parameter tuning of CRNs, given a specification of the system's correctness. 
We focused on the sub-class of bimolecular CRNs due to their importance 
as representations of various molecular algorithms and population protocols. 
However, our approach is more general and could also be applied directly 
to the synthesis of CRNs from other classes (e.g. unimolecular, trimolecular, 
etc.), which are defined through different stoichiometry constraints. The CRNs 
we synthesize can be converted into equivalent physical implementations, for 
example using DNA strand displacement (DSD) [2,3]. However, our approach 
could also be applied directly to synthesize DSD systems through additional 
structural constraints. This could lead to simpler designs than the ones obtained 
through direct translation of CRNs. 

We considered simple reachability properties defined in terms of predicates 
on the initial and final states of a computation which are sufficient to express var¬ 
ious logical and arithmetic functions and operations. More general specifications, 
for example where intermediate states along computations are specified, are also 
currently possible within our approach but extensions to more expressive lan¬ 
guages, such as the probabilistic temporal logics used with other methods [14], 
remains a direction for future work. 

An alternative approach to the problem of realising arbitrary behaviour in 
biochemical systems is to use directed evolution [25,26] In silico evolutionary 
search strategies might scale to larger CRNs and address the synthesis and 
parameter optimisation sub-problems using a single, combined procedure. How¬ 
ever, this comes at the cost of completeness, where the absence of a solution does 
not mean a solution does not exist. In contrast, our method addresses the sub- 
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Fig. 5. Performance of division circuits. The SMT-based method was applied to 
the division specification for CRNs with N species and M reactions for N,M E 
{(3,3), (3,4), (4,4)}. This figure shows the optimisation results for the top 7 CRNs in 
each category. The results are ranked and sorted by their average probability of being 
correct in the grey shaded zone after being optimised for 50 MCMC sample and burn-in 
steps (red bars). If a CRN scored an average probability of over 0.5 then it was optimised 
for a further 200 MCMC burn-in and sample steps. The average probability is shown for 
satisfying CRNs before optimisation (all rates equal to 1.0; blue bars). 


problems separately and uses the SMT solver and theorem prover Z3 to identify 
CRNs that satisfy a given specification (kinetics are ignored at this first stage). 
Since the results provided by Z3 are complete (for a sufficiently large K), the 
termination of the procedure with no solutions is a "proof" that no CRNs exist 
in the given class. Thus, besides providing a practical tool for the identification 
of CRNs with given behaviour, the completeness property means our approach 
could also help explore the theoretical limits of CRN computation (e.g. no CRNs 
with less than M species and N reactions that compute a given function exists). 
For many applications, elements of our method could be complementary with 
evolutionary algorithms. For example, the exact CTMC methods we use to assess 
the probability of correct computations in a given CRN could provide a useful 
fitness function for evolutionary search, compared to alternative approximate 
methods based on stochastic simulation. 

The fully automated generation of "good" CRNs is a challenging problem 
and certain scalability limitations of our current method must be addressed to 
provide a more complete solution. Firstly, the SMT-based synthesis procedure 
we propose may represent large or infinite state spaces and handle systems with 
large molecule numbers. Flowever, currently this method is limited to relatively 
small CRNs with few reactions, species, and which have short computation 
paths. Secondly, the CTMC methods we apply require an explicit represen¬ 
tation of the state space, which must be finite (which is always the case for 
biomolecular CRNs initialised with a finite number of molecules) and contain 
few reachable states — this makes the method suitable for systems involving 
relatively few species and numbers of molecules. To circumvent the need for an 
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Fig. 6. Response of Division algorithms to varied inputs. For each input combination, 
specified as initial copies of species A and species B, the probability that the molecule 
count of X is [A/B\ after 100 time units is reported. Results are shown for the top 
network in each combination of species and reactions (see Fig. 5). The performance of 
each CRN is compared both before optimisation (all rates equal to 1.0; left panels) and 
after optimisation (right panels). 


explicit representation of the state space, stochastic dynamical behaviour could 
be approximated by averaging multiple trajectories from Gillespie's stochastic 
simulation algorithm [27], using fluid or central limit approximations [28], or 
using ordinary differential equations. Depending on the specification, and the 
nature of the CRN, some of these approaches might be appropriate, but none are 
free of their own documented limitations. Finally, the large number of solutions 
identified at the synthesis stage of our approach makes the parameter tuning 
phase challenging and indicates that additional constraints describing more 
accurately the structure and dynamics of "good" solutions could improve the 
method. 

For tuning reaction rates, alternative cost functions could be used that reward 
solutions that are "nearly" correct, e.g. using a mean-squared error. This would 
be most appropriate in high copy number situations, where a precise number of 
molecules is not integral. Our approach is more appropriate for systems operat¬ 
ing at low copy numbers, offering an exact characterisation of the probability 
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that a specific predicate is satisfied. Our results were shown for calculations at 
tf — 100 time units, a transient probability, rather than at the stationary distri¬ 
bution. While the selection of /y is subjective, it allows a circuit programmer to 
specify how long they are willing to wait for a computation. Circuits that reach 
high probability at f > tf will not be rewarded. However, a natural extension to 
the presented method would be to reward circuits that reach high probability at 
t < tf, both imposing an upper bound on time and optimising within that range. 
This could be achieved by integrating our metric over the interval [0, tj]. 

Automating the search for CRNs that compute the solution for a specified 
problem would be beneficial to both theoretical and experimental molecular 
programmers. Our method can be used to show the existence or absence of 
CRNs of a certain size and also suggest CRNs that can be tuned for a specific 
input range, and so become candidate designs for experimental construction. 
Prior to construction, more in-depth analysis of the candidate CRNs produced is 
beneficial, including parameter sensitivity/robustness analysis and bifurcation 
analysis (where appropriate). Future work could also incorporate notions of 
robustness into the proposed method, for example by using interval-based 
methods [14]. Our results illustrate the potential of this approach on several 
examples, including the majority and division functions discussed here. 
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